Multivariate Extreme Value Theory Based Rate Selection for Ultra-Reliable Communications

Diversity schemes play a vital role in improving the performance of ultra-reliable communication systems by transmitting over two or more communication channels to combat fading and co-channel interference. Determining an appropriate transmission strategy that satisfies ultra-reliability constraint necessitates derivation of statistics of channel in ultra-reliable region and, subsequently, integration of these statistics into rate selection while incorporating a confidence interval to account for potential uncertainties that may arise during estimation. In this paper, we propose a novel framework for ultra-reliable real-time transmission considering both spatial diversities and ultra-reliable channel statistics based on multivariate extreme value theory. First, tail distribution of joint received power sequences obtained from different receivers is modeled while incorporating inter-relations of extreme events occurring rarely based on Poisson point process approach in MEVT. The optimum transmission strategies are then developed by determining optimum transmission rate based on estimated joint tail distribution and incorporating confidence intervals into estimations to cope with the availability of limited data. Finally, system reliability is assessed by utilizing outage probability metric. Through analysis of data obtained from the engine compartment of Fiat Linea, our study showcases the effectiveness of proposed methodology in surpassing traditional extrapolation-based approaches. This innovative method not only achieves a higher transmission rate, but also effectively addresses stringent requirements of ultra-reliability. The findings indicate that proposed rate selection framework offers a viable solution for achieving a desired target error probability by employing a higher transmission rate and reducing the amount of training data compared to conventional rate selection methods.


I. INTRODUCTION
Ultra-reliable communication (URC) plays a critical role in fifth-generation (5G) and beyond networks, enabling missioncritical applications in transportation, manufacturing, and healthcare.Examples of these applications include remote control of robots, remote surgery, wireless sensor networks within vehicles, autonomous vehicles, and vehicular teleoperation [1]- [5].In current 5G standards, ultra-reliability and low latency are typically combined to form ultra-reliable low latency communication (URLLC).However, there are specific scenarios, such as health monitoring or disaster recovery, where the highest level of reliability is essential, but a slightly higher latency of more than the conventional 1 millisecond (ms) is acceptable [5], [6].To achieve the ultra-reliability target of 10 −9 to 10 −5 packet error rate (PER), diversity schemes including time diversity [7], frequency diversity [8], spatial diversity [9], and interface diversity [10] have been proposed.These schemes aim to reduce the required signal-to-noise ratio (SNR) to meet the stringent reliability constraints.Spatial diversity, which involves using multiple transmit and/or receive antennas to achieve reliable communication over fading channels, is preferred over frequency and interface diversities due to its cost-effectiveness and minimal transmission delay compared to time diversity [11].However, implementing ultra-reliable communication relies on accurately modeling extreme quantiles that occur infrequently.Therefore, significant advancements in statistical modeling of multiple channels and the development of corresponding transmission strategies are necessary for the successful utilization of spatial diversity in designing ultrareliable communication systems.
In a URC system with spatial diversity, a proper channel model can be derived by i) estimating the channel statistic within a more nuanced notation of coherence distance for ultra-reliable communication [12], ii) using machine learning data-driven frameworks [13], or iii) incorporating novel techniques to the characteristics associated with extreme events across multiple variables, such as proposing the utilization of power law expression techniques by extrapolating from channel data [5] and multivariate extreme value theory (MEVT) [14].In [12], adjustments to the coherence distance definitions are made to accurately capture the distance at which wireless channels can be reliably predicted.The authors employ a quasi-static Rayleigh fading model, assuming spatial independence, to study channel dynamics in the context of URC.Furthermore, [13] addresses the estimation of channelblocking incidents with probabilities on the order of 10 −9 -10 −5 using data-driven methods, without prior knowledge of dynamic channel statistics.This allows for the assessment of wireless connectivity reliability in ultra-reliable communication systems.However, machine learning data-driven approaches require a significant amount of training samples, typically exceeding a factor of (10× −1 ) to achieve the desired error probability .Additionally, [5] proposes a power law framework that extends traditional channel models into the ultra-reliable region through extrapolation.This framework can be further enhanced to accommodate multiple receivers by employing a simplified expression of maximum ratio combining (MRC).Nevertheless, these initial studies still rely on average statistics channels, such as Gaussian, Rayleigh, or Rician, to characterize channel behavior in the ultra-reliable region.Only recently, a novel methodology utilizing MEVT has been introduced in [14] to model multiple channels in a multiple-input multiple-output (MIMO)-URC system by deriving the lower tail statistics of received power.MEVT is a statistical discipline specializing in formulating methods for modeling dependencies among infrequent events by leveraging multidimensional boundary relationships.Despite numerous studies in the field of channel modeling, none have proposed a transmission strategy that relies on real-time assessment of the lower tail statistics of received power across multiple diversities.
The design of communication systems for ultra-reliable communications has been addressed in the context of adaptive relay selection techniques [15], and rate selection frameworks [6], [16], [17].A cooperative communication-based relaying scheme has been proposed in [15] to provide ultra-high reliability while meeting the latency requirements at a moderate signal-to-noise ratio (SNR) by modifying the definition of traditional coherence time and revising fading dynamics of wireless channels in the context of ultra-high reliability.On the other hand, [6] uses a rate selection framework based on the extrapolation of average statistic-based channel models to design an ultra-reliable system.Moreover, [16] develops a data-driven rate selection framework where federated learning (FL) training has been used to estimate the transmission rate and assess the reliability of the system with minimum training time in the absence of knowledge about the channel state information (CSI).However, all of these designs use the extrapolation of average-statistics channel models, which have been demonstrated not to fit the empirical data in the ultrareliable region [14], [17]- [19].We have recently proposed a novel extreme value theory (EVT)-based framework in [17] for estimating and validating the optimal transmission rate to address the constraints of ultra-reliable communications for a single transmitter-receiver pair.Nevertheless, none of these studies incorporate the statistics of multiple channels in the ultra-reliable region based on real data into the design of communication systems.
The estimation accuracy of the channel parameters and the transmission strategies need to be derived in the ultrareliable regime due to the limited amount of training data by using confidence interval (CI) [13], [20], [21] and bootstrapping techniques [18].In [13], the authors propose a multi-layer perceptron (MLP) neural network (NN)-based machine learning data-driven framework incorporating CI to the statistics of a non-blocking connectivity duration to maintain the URLLC with the penalty of large training samples to satisfy the target error probability .In [20], the authors evaluate the outage probability within a confidence level to address the reliability of the URC system by incorporating a non-parametric statistical learning algorithm into a rate selection framework assuming extrapolation of the average statistics fading channels.Only recently, in [21], we proposed a pioneering framework based on EVT that estimates the optimal transmission rate with a high degree of confidence using a reduced number of training samples.The proposed approach was further validated by evaluating the outage proba-bility for ultra-reliable communications in a single transmitterreceiver scenario.On the other hand, in [18], we develop a bootstrapping-based algorithm for the determination of the stopping conditions in collecting sufficient samples to estimate the tail statistics based on the normality assumptions of the return levels.Accordingly, if the sample size is sufficiently large to estimate the channel tail distribution of values exceeding a predetermined threshold by using the generalized Pareto distribution (GPD), the corresponding return levels obtained in the bootstrap iterations are normally distributed.However, none of these studies incorporate ultra-reliable communication statistics of the diversities into the accuracy of simultaneously estimating the channel parameters and resulting transmission strategies.
The main objective of this study is to propose a transmission strategy for a real-time URC system that employs spatial diversity by considering the MEVT-based statistics of multiple channels while incorporating confidence intervals into parameter estimation to cope with the limited availability of data.The spotlight is on two-dimensional or bi-variate instances, with the aim of illustrating key concepts and concerns related to MEVT while avoiding the added complexity associated with a comprehensive multivariate approach.However, this methodology can be easily generalized to multiple dimensions.According to the proposed methodology, the parameters of multiple channels are first estimated based on MEVT.Then, the optimal transmission rate is determined by incorporating the estimated multiple channel parameters while considering the ultra-reliability constraint subject to the target error probability .Finally, the reliability of a system operating at the determined transmission rate is assessed by employing the outage probability.The uncertainty of the transmission rate is also obtained in relation to the confidence intervals of the estimated channel parameters.The original contributions of the paper are listed as follows: • We propose a novel framework for ultra-reliable realtime transmission considering both spatial diversities and ultra-reliable channel statistics based on MEVT, for the first time in the literature.The framework includes the modeling of the inter-relationship of the tail distributions of multiple channels by using the bi-variate GPD (BGPD) based on the Poisson point process approach, determination of the optimum transmission rate by using the estimated BGPD model, incorporating confidence intervals to the estimated transmission rate due to the restricted amount of data, and then assessment of the system reliability by utilizing the outage probability metric.• We propose a novel algorithm for the rate selection in a multi-diversity system based on the Poisson point process approach, for the first time in the literature.The algorithm incorporates the parameters of the probability measure function and BGPD into the optimum transmission rate to satisfy the ultra-reliability constraints.• We propose an algorithm for the determination of the confidence interval of the transmission rate based on the confidence intervals of the estimated Uni-variate GPD (UGPD) parameters, for the first time in the literature.
We determine the confidence intervals of the UGPD parameters by incorporating the bootstrapping-based biascorrected accelerated (BCA) method that offers an order of magnitude improvement in accuracy for the parameter estimation using the maximum likelihood estimator (MLE).• We utilize a combination of one omnidirectional transmitter and two directional receivers to capture the data within the engine compartment of a Fiat Linea vehicle under different driving scenarios and engine vibrations.Through our proposed methodology, we achieve superior modeling accuracy for URC compared to conventional extrapolation-based models that rely on average statistical channels for multi-channel modeling.The rest of the paper is organized as follows: Section II describes the system model and assumptions considered throughout the paper.Section III presents the MEVT-based rate selection framework for determining the optimum transmission rate and its affiliated confidence interval for a system operating in URC.Section IV provides the channel measurement setup and the performance evaluation in determining the optimum transmission rate and outage probability, as well as the confidence interval for the estimated rate.Finally, concluding remarks and future works are given in Section V.

II. SYSTEM MODEL
We consider a one-way communication in which a transmitter (Tx) sends data packets to two receivers, namely Rx1 and Rx2, at the total rate , addressing the requirements of ultra-reliable communication.The framework can be easily extended for the scenario in which more than two receivers exist without loss of generality.Assuming a fixed and predetermined transmit power, estimating the received signal power is tantamount to determining the squared amplitude of the channel state information [6], [18].
At the training phase and before the transmission starts, the channel samples are collected from receivers Rx1 and Rx2 and converted into independent and identically distributed (i.i.d.) sample sequences denoted by and , respectively, by applying declustering method [18], [22].The channel data in the training phase can be collected by using either the pilot signals or the records from the prior data transmissions.The recommended amount of required samples in the training phase is about 1/ , where is the target error probability on the order of 10 −9 -10 −5 for the URC system.Suppose the channel is stationary according to the Augmented Dickey-Fuller (ADF) test results.In that case, the channel tail distribution is modeled by applying EVT to the received signal powers and estimating the parameters of the UGPD fitted to the channel tail distribution.Otherwise, external factors leading to variation over time in the parameters of the UGPD are identified and utilized to partition the sequence into stationary groups of size .Then, EVT is applied to each stationary sequence for estimating the shape and scale parameters of UGPD as a change-point function of time, as explained in detail in [19].The parameters of UGPDs obtained for the received powers of Rx1 and Rx2 are then mutually used to determine the tail distribution of the joint probability distribution of multiple channel sequences and by using MEVT techniques to characterize the statistics of the inter-relationships of extreme events [14].The transmitter assumes that the main source in the block fading channel is link outage [5], [6], [23].
Upon deriving MEVT-based channel model, transmission strategies are determined to estimate the optimum transmission rate with the goal of fulfilling a predetermined reliability constraint such that where ( , ) denotes the transmission rate estimated based on the training received power samples of and , is the joint CDF of the training samples from channel data and , and ( ( , )) is the outage probability at transmission rate ( , ) defined as where is any received power from test samples of receiver Rx1 or Rx2.The transmitter determines the optimum rate as a function of as follows: where −1 ( ) is the -quantile of , and is determined as a function of and the parameters of with the goal of satisfying constraint (1).
Upon determining the optimal transmission rate using the proposed MEVT rate selection methodology at target error probability , the confidence intervals of the estimated UGPD parameters are computed, considering some uncertainty in the parameter estimation by incorporating the probability of wrong decision .Subsequently, the statistics of transmission rate are estimated using MEVT, which incorporates the confidence intervals of the estimated UGPD parameters to satisfy the target error probability for a system equipped with spatial diversity operating in the ultra-reliable communication domain.

III. MEVT-BASED RATE SELECTION FRAMEWORK
The goal of the MEVT-based rate selection framework is to determine an optimum transmission rate along with the confidence interval based on the estimation of the lower tail statistics of multiple channels for real-time communication in the ultra-reliable regime.The proposed methodology involves a series of steps, including the conversion of the received power sample sequences into independent and identically distributed (i.i.d) samples.This is achieved through the application of the declustering method, which removes interdependency between the samples.Then, the inter-relationship of bi-variate extremes is modeled by utilizing BGPD based on the MEVT-Poisson point process approach.Upon determining the optimum transmission rate based on the proposed MEVT-based rate selection framework, channel reliability is assessed based on the results of outage probability.Finally, the confidence interval is incorporated into the proposed rate )

A. Bi-variate Channel Estimation
At the training phase, the channel samples are collected from receivers Rx1 and Rx2 and converted into i.i.d.samples through the declustering approach.In this approach, the samples are folded into multiple clusters, and then the minimum of each cluster is extracted; the minima are considered i.i.d.samples [18], [22].Upon applying EVT to the resulting sequence of i.i.d.samples from individual channel sequences, the optimum thresholds are determined separately by applying two complementary methods based on EVT: mean residual life (MRL) and parameter stability methods [18], [22].According to the MRL method, the optimum threshold is the highest threshold below which the mean of values exceeding a given threshold is linear with respect to the threshold.On the other hand, the optimum threshold based on the parameter stability method is the highest threshold below which the estimated shape and modified scale parameters are linear against the threshold [18], [22].The MRL method can be applied prior to the estimation of the tail distribution by using UGPD, while the parameter stability method should be applied after deriving the tail statistics by using UGPD.Since the bi-variate analysis are limited to time periods where both thresholds are exceeded, in this study, only the exceedances that occur simultaneously are considered, while any exceedances that occur at different time intervals are excluded.Following this, the parameters of the UGPD are estimated for the exceedances of each channel data, using the maximum likelihood (ML) estimator with the corresponding optimal thresholds.The UGPD fitted to the tail of sample sequence is formulated as where ( − ) denotes the non-negative exceedance for observation ∈ below optimum threshold ; and and ˜ are the estimated shape and scale parameters of UGPD, respectively.The validity of the fitted UGPD model to the tail distributions is assessed by using the probability plots, including the probability-probability (PP) plot and the quantile-quantile (QQ) plot.The PP and QQ plots are graphical techniques that are employed to compare the empirical values with the corresponding modeled results.In the PP plot, the empirical CDF of the occurrence for each extreme value is plotted against the corresponding CDF obtained by the UGPD, while in the QQ plot, the empirical extreme quantile is plotted against the corresponding quantile obtained by the inverse of UGPD [18].If the UGPD properly characterizes the extreme values exceeding threshold , then both PP and QQ plots conform to the unit diagonal line, i.e., a straight line with the horizontal angle 45 • .Thereafter, the inter-relationship of bi-variate extremes is modeled by applying the Poisson point process approach.
In bi-variate extreme analysis, the first step involves examining the presence of dependency within the tail samples and the total samples.The feasibility of spatial diversity is evaluated using the correlation between received powers in the total samples.Spatial diversity is considered reasonable if the correlation coefficient is within the range of 0.1 and 0.5 among the total samples, since otherwise fading occurs simultaneously at different links [24].On the other hand, if the extremes of the two sample sequences are independent, i.e., the correlation coefficient approaches 0, there is no need for bivariate modeling.However, if there is correlation between the tail samples, it is necessary to investigate the inter-relationship of the extreme values of receivers.
In the Poisson point process approach, the channel data is transformed in two steps: i) Fréchet transformation and ii) Pickands coordinates.The Fréchet transformation is applied on the obtained tail sequences of and as follows: and where = ( < ) and = ( < ) for optimum thresholds and , respectively.Accordingly, the marginal distribution of variables ˜ and ˜ approximately have Fréchet distribution for < , ∈ , and < , ∈ .After applying the Fréchet transformation, the pseudo polar Pickands transformation is applied by introducing pseudo polar radial and angular components as where ∈ is the length of vector ˜ and ˜ , and ˜ and ˜ are the Fréchet transformation of and , respectively.Upon determining the probability measure function of the Pickands angular coordinate ( ), satisfying the mean constraint ∫ 1

B. Bi-variate Rate selection
The goal of this section is to choose the maximal transmission rate that achieves the target reliability by satisfying constraint (1).The maximum rate is determined as a function of as where −1 ( ) is an estimate of a positive -quantile of the bi-variate channel .Here, the objectives are i) to determine −1 ( ) and ii) to find that maximizes the transmission rate ( ) while satisfying the target reliability based on constraint (1).
In order to determine −1 ( ), we first obtain the inverse joint CDF function of the training samples from channel data and .
Since determining the inverse of function is a non-trivial task, the Newton-Raphson method is utilized to determine the inverse of ( ˆ ; ˆ , ˆ ) [26].Newton-Raphson method is a recursive approximation technique for finding the root and inverse of a differentiable function.
The optimum transmission rate based on the proposed rate selection framework is then determined by substituting the results of Theorem 1 into (11) as for the maximum allowed error probability , ≤ , where −1 ˆ (.) denotes the CDF-inverse of -distribution fitted to the angular component ˆ with shape parameters ˆ and ˆ , denoted by ( ˆ ; ˆ , ˆ ).

Suppose (
) is a valid transmission rate.In that case, the corresponding outage probability ( ( )) is expected not to exceed the target reliability , according to constraint (1).The average outage probability for the transmission rate obtained in the previous section is obtained by substituting (16) into the outage probability equation expressed in (2) as where is the total number of samples including training and test samples, and (.) is the CDF of -distribution with parameters and fitted to , i.e., ( ; , ), assuming the channel is perfectly known.In order to determine the maximum allowed error probability , the constraint (1) should be satisfied as where −1 ( , ) denotes the -quantile of the -distribution with parameters and fitted to , and −1 ( , ) ˆ , ˆ is CDF of -distribution with parameters ˆ and ˆ fitted to ˆ and computed at point −1 ( , ).

D. Confidence interval
The confidence interval for the transmission rate can be calculated by deriving confidence intervals for the estimated UGPD parameters.Although various methods can be used for computing confidence intervals, bootstrapping methods are mostly preferred due to their higher accuracy and avoidance of assumptions on the properties of the distribution of the original samples.Bootstrapping methods generate confidence intervals for statistical parameters by drawing pseudo-samples either from the original sample sequence or a model fitted to the original sample.In the realm of statistics, conventional bootstrapping-based methods are highly dependent on the asymptotic normality of parameter estimate to construct a standard confidence interval for estimation of the parameter ˆ .However, a standard confidence interval might not be a proper way of considering the estimation uncertainty for many applications, especially when the distribution of the parameter estimation, i.e., ˆ , is not normal.Alternatively, the bootstrapbased bias-corrected and accelerated (BCA) method has been demonstrated to provide higher accuracy with narrower CIs correcting (i) the non-normality of ˆ by utilizing the bootstrap distribution where a parametric model assumption is made for ˆ , (ii) the bias of ˆ by using the bias correction parameter, and (iii) the non-constant standard error of ˆ by introducing the acceleration parameter [27], [28].
1) Fundamentals of bootstrap-based BCA: bootstrap samples are generated at each bootstrapping round with replacement, and then an estimate of the parameter is obtained as ˆ at the -th round for ∈ {1, 2, ..., }.The BCA method is based on the assumption that there is a monotone transformation ˆ = ℎ( ˆ ) such that ˆ ∼ ( − where Φ is the standard normal CDF and ˆ (.) is the empirical probability of the estimated parameter ˆ and calculated as in which is the number of estimated parameters of bootstrapping ˆ smaller than the estimated parameter based on the original samples ˆ .Please note that ˆ is the estimated parameter from the original samples while ˆ is the estimated parameter from bootstrapping the original samples.
The acceleration factor ˆ can be obtained based on jackknife re-sampling (or leave-one-out procedure) [28]- [30] by incorporating replicates of the original samples, where is the total number of the original samples.Jackknife resampling involves generating a series of jackknife replicates by repeatedly leaving out one observation from the original sample.Specifically, the first replicate is generated by excluding the first observation, the second replicate is generated by excluding the second observation, and so on until replicates of size −1 are obtained.For each sample number , ∈ {1, 2, ..., }, the estimated parameter ˆ is obtained.Then, the average of these estimations is calculated as ¯ = 1 =1 ˆ .The acceleration factor is computed as Finally, the confidence interval is constructed as where the estimated ˆ 's are ordered from smallest to the largest, = ˆ 1 × ( + 1), and and ˆ 2 defined as and where is the 100(1− 2 ) percentile of the standard normal distribution for significance level (or probability of wrong decision) [30].
2) BCA in CI determination of UGPD parameters: The estimated scale and shape parameters of UGPD are not exactly ˆ , ˆ and typically take values within the confidence intervals [ ˆ , ˆ ] and [ ˆ , ˆ ], respectively.The BCA-based algorithm that is utilized to determine the upper and lower bounds of the CI of the shape and scale parameters are given in Algorithm 1 and is described in detail as follows.The inputs of the algorithm are the observation sample set ; the significance level , depending on the application and how much error is allowed in the parameter estimation; and the number of newly generated data sets in the bootstrapping process denoted by .The algorithm starts by bootstrapping the original data set for times and storing the new data sets in the vectors denoted by , ∈ {1, 2, ... } (Lines 1−2).The UGPD is fitted to the tail of data set , and the parameters of UGPD are estimated as ˆ and ˆ for ∈ {1, 2, ... } (Line 3).Upon estimating the UGPD parameters for each data set , the bias correction factor is obtained based on (19) for the estimated scale and shape parameters of UGPD as  ), is a function of the pseudo-polar angular component ˆ , or equivalently Fréchet transformations ˆ and ˆ .The CI of the angular component and the Fréchet transformed variables are functions of the confidence intervals of the scale and shape parameters.Therefore, the transmission rate ( ) is estimated to be within the interval [ , ], considering the confidence intervals of the estimated Pareto parameters.
Referring to ( 6) and ( 7), the upper (lower) bound of the scale parameters ˆ and ˆ along with the lower (upper) bound of the shape parameter ˆ and ˆ provide the lower (upper) bound of the Fréchet transformation parameters ˆ and ˆ , respectively.Since the transmission rate obtained in Section III-B is a function of the CDF of angular component ˆ , and the upper (lower) bound of the ˆ in conjunction with the lower (upper) bound of ˆ , results in the lower (upper) bound of ˆ .Moreover, since the CDF and CDF-inverse of ˆ are monotonically increasing functions, the lower (upper) bound of ˆ affects the lower (upper) bound of the estimated transmission rate ( ) as follows: ) / and −1 ˆ / denotes the lower(L)/upper(U) bound of the estimated transmission rate ( ) and CDFinverse of the estimated Pickands angular component −1 ˆ , respectively.
It is worth noting that although ( ) is a function of (− ˆ / − ˆ / ) and , these parameters do not affect the CI of the transmission rate for the following reasons: For the first parameter, the function included in the formula in (18) cancels the effect of (− ˆ / − ˆ / ); for the second parameter, approaches a fixed value equal to the target error probability for large sample numbers.
The complexity of the proposed MEVT-based rate selection framework is ( ), where and are the number of transmitters and receivers, respectively, and is the number of the training samples for individual channel sequences.

IV. NUMERICAL RESULTS
This section aims to evaluate the proposed rate selection framework compared to the traditional extrapolation-based approaches for a MIMO-URLLC system.In the following, we first provide the benchmark algorithm based on the traditional extrapolation-based approaches and simulation platform in Sections IV-A and IV-B, respectively.Then, in Section IV-C, we evaluate the performance of the proposed MEVT-based rate selection framework in determining the maximum transmission rate for URLLC in spatial diversity, evaluating the system reliability by using the outage probability metric and comparing it to the traditional extrapolation-based method.Finally, in Section IV-D, the estimation of the rate confidence interval is obtained by computing the confidence interval for the parameters of BGPD fitted to the tail distribution of the channel data, iterating over multiple sample sizes and values of the significance level .

A. Benchmark Extrapolation-based Algorithm
In the conventional extrapolation-based method, the distribution of the current channel data is estimated for the reliability order ranging from 10 −3 to 10 0 PER, based on individual channel data sequences [31]- [32].Subsequently, the tail distributions of these sequences are estimated by extrapolating the obtained results towards the ultra-reliable region of 10 −9 to 10 −5 PER [5].After fitting different distributions to the samples in each observation sequence, the optimal distribution identified in our study is the Gaussian distribution, based on the Akaike information criterion/Bayesian information criterion (AIC/BIC) metric [14].In order to estimate the tail distribution of joint probability and establish a fair comparison between our proposed approach with the extrapolation-based method, we apply the Fréchet transformation, and then the Pseudo-polar Pickands transformation to the extrapolated results [33].Accordingly, the probability measure function of the corresponding angular component of Pickands transformation is modeled using the Beta distribution.Finally, the optimum transmission rate associated with the targeted error probability is modeled as where is the extrapolated CDF, ˆ , ˆ are the corresponding Fréchet variables of extrapolated results, is the probability measure function of the angular component of extrapolated results, and † is the maximum allowed error probability for the extrapolation-based method calculated as where

B. Measurement Setup and Simulation Platform
The proposed methodology is implemented on the dataset obtained from the engine compartment of a Fiat Linea vehicle, as depicted in Fig. 2a, utilizing a Vector Network Analyzer (VNA) (R & S ® ZVA67), as shown in Fig. 2b.The VNA is connected to the transmitter and receivers, Rx1 and Rx2, via the R & S ® ZV-Z196 and PE361 port cables, respectively, which have a length of 610 mm.The transmitter is an omnidirectional antenna operating in the frequency range of 58 GHz to 63 GHz, with a nominal gain of 0 dBi.The receivers are horn antennas operating in the frequency range of 50-75 GHz, with a nominal gain of 24 dBi, and horizontal and vertical half power beamwidths of 11 • and 9.5 • , respectively.The locations of the transmitter and receiver antennas are selected out of the possible locations for the wireless sensors located within the engine compartment, namely locations 13 and 2&4 in [34]-Fig.1, respectively, and shown in Fig. 2a.The antennas are interconnected with the coaxial cables via a waveguide that operates within the frequency range of 50-65 GHz.The waveguide exhibits an insertion loss of 0.5dB and impedance of 50Ω.It is worth noting that an omnidirectional antenna at the transmitter along with the directional antenna at two receivers are utilized to consider the usage of diversities in an ultra-reliable communication system and study the inter-relationships of extreme events for a MIMO system under an intra-vehicular communication.MATLAB is used to implement the proposed framework on the collected data.
We categorize the samples from receivers Rx1 and Rx2 into two stationary groups: The first group, 1 , corresponds to driving on the smooth road, and the second group, 2 , corresponds to driving on the ramp.The optimum thresholds for the received power samples from receivers Rx1 and Rx2 are determined for each group of stationary data sequences based on the optimum threshold determination process in Section III-A.Accordingly, for the stationary group 1, the optimum thresholds of Rx1 and Rx2 are −7 dBm and −6.8 dBm, respectively, and for the stationary group 2, the optimum thresholds of Rx1 and Rx2 are −8 dBm and −27 dBm, respectively.In the following, we only present the results of 2 due to the limited number of pages.However, a similar observation has been made in the sample data of 1 .

C. Transmission Rate
Fig. 3 shows the transmission rate of BGPD fitted to the filtered i.i.d.samples of the group 1 at different sample numbers, and error probabilities ∈ {10 −3 , 10 −4 , 10 −5 } compared to the extrapolation-based results.To address the same target error probability , a higher value of optimum transmission rate is obtained based on the proposed MEVTbased framework, up to 10 3 more than the extrapolation-based benchmark algorithm.The gap between their rates increases as the error probability increases.Additionally, according to the results obtained based on the extrapolation method at = 10 −3 , the transmission rate drops significantly after some sample number since † approaches 0, meaning that the maximum allowed error probability † is estimated 0 instead of ∼ 10 −3 which is the target error probability.Besides, the estimated transmission rate is not a function of the sample number for a large number of samples and varies only as a function of the target error probability .These results are consistent with the results of the EVT-based rate selection framework for single-input single-output (SISO)-URC presented in [17] and the extrapolation-based rate selection framework in [6].
The outage probability of the estimated transmission rate for ( ) at different target error probability has also been computed.The outage probability results are the same for the proposed framework and the traditional extrapolation-based method.However, to experience the same outage probability, the traditional method requires a lower transmission rate which is neither power nor time-efficient.Therefore, by applying the proposed MEVT-based rate selection framework, we achieve a higher transmission rate while addressing the ultra-reliable constraint in which the outage probability does not exceed the target error probability.

D. Confidence Interval
Fig. 4 shows the estimated UGPD parameters along with the lower and upper bounds of CIs for = {0.05,0.2, 0.5} and different sample numbers in both receivers Rx1 and Rx2 for stationary group 1.Compared to the standard CI results presented in [21], the confidence interval obtained by the non-bootstrap method is wider in general above a certain number of samples, and the estimated GPD parameters are almost constant.Moreover, it is observed that the MLEs of the scale ( ) and shape ( ) parameters are subject to significant uncertainty when the sample size is limited.Nonetheless, the level of uncertainty reduces progressively with an increase in the sample size.This is consistent with the findings from the standard CI results as reported in [21].This underscores the trade-off between the uncertainty of the GPD parameters and the expenses involved in gathering data.In contrast to the conventional non-parametric technique for rate selection, which mandates noticeably more training on the channel, as outlined in [6], our framework based on MEVT appropriately infers the rate with a certain level of assurance, i.e., CI, using a smaller sample size of around 1/ .Fig. 5 illustrates the empirical CDF of the angular component of Pickands transformation along with the 0.01 confidence interval (the dotted plots), compared to the estimated Beta distribution fitted to the probability measure function of the Pickands' angular coordinate for two different sample numbers. 1 and 2 correspond to the fitted -distribution for the sample numbers 1 ≈ 1.3 × 10 5 and 2 ≈ 5 × 10 5 , respectively, where the estimation of UGPD parameters and the corresponding angular component become stable.It is observed that both 1 and 2 distributions fit the empirical results within the = 0.01 confidence interval.Consequently, utilizing MEVT for modeling the multivariate channel and determining the associated transmission rate can significantly reduce the required number of samples by a minimum of 10-fold compared to traditional extrapolation and data-driven models.bers, where the estimation of UGPD parameters suffers from uncertainty and varies within a wider confidence interval.However, above a certain sample size, the estimation of CIs becomes stable and converges.These observations are in good agreement with what we observed for the SISO system illustrated in [21].

V. CONCLUSIONS
In this paper, we propose a novel framework based on the multivariate extreme value theory with the goal of determining an optimum transmission rate for a system operating in MIMO-URC, and incorporating a confidence interval for the estimated BGPD parameters and the corresponding optimum transmission rate to enable the usage of MEVT in real-time communication.The framework includes the modeling of the inter-relationship of the tail distributions of the multiple channels by using the BGPD based on the Poisson point process approach, determination of the optimum transmission rate by using the estimated BGPD model, incorporation of the confidence intervals for the estimated transmission rate due to the availability of the restricted amount of data, and then assessment of the system reliability by utilizing the outage probability metric.The proposed MEVT-based rate selection framework achieves a higher transmission rate, up to about 10 3 , than the traditional methods based on the extrapolation of average statistic channel models.Incorporating CI into the rate selection estimate can lower the required number of samples, at least by 10-fold, during the training phase necessary to achieve a specific level of reliability.This reduction in required samples subsequently decreases the sample complexity and contributes to a more efficient rate estimation.It has been demonstrated that calculating GPD parameters using limited samples leads to a wider confidence interval in rate estimation, implying a greater degree of uncertainty.Nevertheless, as additional data become accessible, the confidence limits become narrower because of the diminishing level of uncertainty.Moreover, based on the proposed methodology, we demonstrate the trade-off between the uncertainty in the rate selection and the cost of data collection.Since a large amount of data is required for deriving the statistics, our future endeavors encompass the expansion of our research through the implementation of transfer learning techniques, specifically employing knowledge-assisted training.This approach entails the development of a digital twin model that accurately represents a real-time network, incorporating essential elements such as network topology, channel characteristics, and queueing models for offline training purposes.Subsequently, this model is refined and optimized using a smaller dataset in the live operating environment.

ˆ 0 , 2
), where the transformed variable ˆ follows a normal distribution with mean − ˆ 0 and variance 2 , is the median of the distribution of the bootstrap estimates, ˆ factors are constant and estimated based on the CDF of normal distribution and jackknife re-sampling, respectively.The bias correction factor ˆ 0 is estimated as

ˆ 2 6 :
(Line 5).The corresponding acceleration factors of the scale and shape parameters (Line 10) based on(21) are determined by first applying the jackknife method and obtaining − 1 replica of by leaving out the first samples of , denoted by − (Line 6 − 7) and then, estimating the scale and shape parameters of fitted UGPD to each data set − as ˆ and ˆ , respectively (Line 8).Next, the {(1 − ) × 100}% confidence intervals of the scale and shape parameters, denoted by [ ˜ , ˜ ] and [ , ],respectively, are estimated for the significance level based on (22) (Line 12), through the determination of coefficients according to(23) and(24) for the scale and shape parameters (Line 11).The same procedure is applied to the sample set to compute the corresponding CI of the estimated parameters of UGPD fitted to .Algorithm 1 BCA-based CI Determination Algorithm Input: = [ 1 , 2 , ..., ], , and ; Output: ˜ , ˜ , , and ; 1: for b = 1:B do ˆ and ˆ , by fitting the UGPD to the tail samples of ; 4: end for 5: determine the bias correction factors for = 1 : − 1 do 7: leave out the first samples of , as − ; 8: estimate ˆ and ˆ , by fitting UGPD to − ; 9: end for 10: determine acceleration factors ˆ and ˆ ; the CIs of the (1 − ) BCA-CI, [ ˜ , ˜ ] and [ , ], for the UGPD parameters.13: return ˜ , ˜ , , and ; 3) BCA in CI determination of the transmission rate: Upon determining the upper and lower bounds of the estimated CI corresponding to the scale and shape parameters of and , the ˜ , ˜ , ( ), are estimated within the intervals [ ˜ , ˜ ], [ ˜ , ˜ ], [ , ], respectively.The estimated transmission rate, ( denotes the -quantile of the distribution with parameters and fitted to the probability measure function of the angular component for the extrapolated results, and−1 , (,) ˆ , ˆis CDF of distribution with parameters ˆ and ˆ computed at point

Fig. 2 .
Fig. 2. Measurement setup with the transmitter (TX) and receiver (RX) antennas located in the engine compartment of Fiat Linea: (a) Engine compartment, and (b) VNA setup.

Fig. 4 .
Fig. 4. The estimated Pareto parameters along with their CI considering = 0.05, 0.2, 0.5 for stationary group 1, receivers Rx1 and Rx2, at different sample numbers: (a) Scale parameter and the corresponding CI for Rx1, and (b) Shape parameter and the corresponding CI for Rx1; the horizontal black line refer to the −0.5 minimum acceptable value for the shape parameter of UGPD.

Fig. 6 Fig. 5 .
Fig. 5.The CDF of pseudo-polar Pickands angular component based on empirical results along with 0.01 confidence interval, the Beta distribution 1 fitted to the sample numbers ≈ 1.3 × 10 −5 , and the Beta distribution 2 fitted to the sample number ≈ 5 × 10 −5 .